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ABSTRACT 


Handwritten character recognition is a very tough task in case of complex 
shaped alphabet set like Bangla script. As optical character recognition 
(OCR) has a huge application in mobile devices, model needs to be suitable 
for mobile applications. Many researches have been performed in this arena 
but none of them achieved satisfactory accuracy or could not detect more 
than 200 characters. MobileNet is a state of art (convolutional neural 
network) CNN architecture which is designed for mobile devices as it 
requires less computing power. In this paper, we used MobileNet for 
handwritten character recognition. It has achieved 96.46% accuracy in 
recognizing 231 classes (171 compound, 50 basic and 10 numerals), 96.17% 
accuracy in 171 compound character classes, 98.37% accuracy in 50 basic 
character classes and 99.56% accuracy in 10 numeral character classes. 


MobileNet 


Optical character recognition This is an open access article under the CC BY-SAlicense. 
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1. INTRODUCTION 

Convolutional neural network (CNN) has recently been incredibly popular in image recognition. 
But most of the models are very bulky and need a lot of time to train. But for modern application like in 
mobile applications need light and fast models. In case of OCR application, models need to provide high 
precision though having large number of classes to distinguish between. MobileNet is a state of art CNN 
architecture which is very small in size, very fast and provide very high accuracy. In this paper, we have 
introduced a MobileNet based architecture that can recognize Bangla handwritten character. 

Handwritten character recognition (HCR) is a significant part of OCR system which is also a very 
popular research area. Handwritten character recognition has achieved advancement in English, Arabic etc. 
language. Bangla is a very popular language with around 250 million speakers. Bangla is the official 
language of Bangladesh. So, handwritten character recognition of Bangla alphabets needs to be advanced 
and accurate. 

Shape of Bangla alphabet is very complex. Some of the characters are written in different pattern 
by different person. Some of the characters are distinguished by a single stroke of line called ‘Matra’. 
Bangla script contains around 50 basic characters and 10 numeric characters. These basic characters also 
become combined and create compound characters. Bangla script contains more than 300 compound 
characters. Due to large number of classes, complex and similarity in shape, recognition of Bangla character 
is a very tough task. Figure | illustrates somecomplex shaped basic and compound bangla characters. 
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Figure 1. Some Bangla characters 









Bangla Handwritten characters are very cursive and complex shaped. There are more than 
400 compound characters which are needed to be classified. Even some of the characters can be written in 
two forms. So, it is really hard to distinguish one from another. A lot of researches have been carried out in 
English or Arabic handwritten character recognition but its limited in Bangla character set. The model needs 
to be fast and lightweight to use in low config devices. MobileNet can be a very good solution as it is fast, 
lightweight and capable of acquiring very high precision in large classification task. We have proposed 
a lightweight, fast and suitable Bangla handwritten character recognition (BHCR) architecture. Our system 
can successfully recognize 231 Bangla character classes. 


2. RELATED WORKS 

Before 2010, research on BHCR was confined in basic and numeral character detection. After 2015, 
some researchers started to work with compound characters. But still, there is no work which can classify 
all the characters together. Zhang [1] showed how convolutional neural network can recognize Chinese 
handwritten characters. Chinese characters are very cursive and complex shaped. The researcher showed how 
deep CNN can extract higher level features as depth of the convolutional layer increases. He also showed 
adding a convolutional layer improves performance more than adding an extra dense layer. He also showed 
more filters help to achieve better performance. He showed deep networks can achieve better performance 
and adding a convolutional layer after 5" conv layer provides more benefit than adding after 7/8" conv 
layer. He achieved 97.3% accuracy during classifying 200 classes and 95.5% accuracy in 3755 
characters classification. 

Das et al. [2] used MLP & SVM classifier and shadow, longest and quad tree feature set to classify 
50 basic, 43 compound characters total 93 characters. This method achieved 80.86% precision by using SVM 
classifier. Sarkhel et al. [3] classified around 231 characters using a region sampling method where most 
discrimination portion of the image was selected to distinguish with other characters and SVM was used as 
classifier. It achieved 72.87% precision in 384 classes classification. Pramanik [4] introduced a shape 
decomposition-based architecture where compound characters were converted into basic characters. 
It reduced complexity and MLP was used as classifier. They recognized 171 compound characters and 
obtained 88.74% accuracy. Das et al. [5] used a convex hull-based feature extraction method for classifying 
50 basic characters and 10 numerals. Das et al. [6] proposed an improved feature set containing 132 features. 
Modified shadow features, quad tree based longest run feature, distance-based features, octant and centroid 
features were added in proposed feature set. MLP with one hidden layer was used for classification. 
Accuracy improved from 75.05 to 85.40% on 50 basic character classes. Basu [7] proposed a word 
segmentation process to extract characters. Anew feature descriptor was also proposed for this purpose. 
Bhowmik et al. [8] used SVM, RBF and MLP based method for classifying 45 basic character classes. 
Image was classified in a group and then the original class label was extracted. This architecture was 
more effective than traditional SVM based methods. Parui [9] proposed a hidden Markov model and tried 
a stroke-basedtechnique where 54 groups of strokes were identified, 6 groups of strokes were generated 
and a distinct HMM was assigned for each stroke. They achieved 87.7% classification accuracy in test set. 
Roy [10] introduced a process for generating database of strokes on the basis of handwritten characters. 
Construction of characters from its stroke was used in this proposed method. The classification precision was 
96.85% for the isolated strokes. Sazal et al. [11] implemented a deep belief network which took raw images 
as input and it was processed through unsupervised learning and a supervised fine tuning for classifying 
50 basic and 10 numeral character at 90.27% accuracy. 

Tapotosh Ghosh et al. [12] reviewed existing works on bangla handwritten character recognition 
(BHCR). He showed CNN based methods achieved better result than other architectures. Roy et al. [13] 
introduced a neural network architecture where a layer wise training method and RMSprop optimizer was 
used. RMSprop optimizer is capable of achieving faster convergence. They obtained 90.33% precision in 
recognizing 173 Bangla handwritten character classes. Ashiquzzaman et al. [14] stated a CNN architecture. 
Overfitting problem was reduced in this architecture by dropouts and gradient vanishing problem was 
decreased by ELU filter. They classified 171 characters and obtained 93.68% precision. Fardous et al. [15] 
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developed a CNN model which is consist of 8 conv layer, 2 fully connected layer, and 4 pooling layers. 
They used ReLU as activation function which is able to introduce non-linearity. They also used dropout 
in order to reduce overfitting. They achieved 95.5% precision in 171 compound character classification. 
Saha et al. [16] stated a CNN model where different number of filters were used in every layer to acquire 
necessary features from the image. They followed GOOGLENET architecture. They only classified 
84 characters with 97.21% precision. Rabby et al. [17] introduced a 22 layered CNN architecture to classify 
122 classes. They classified 50 basic characters, 52 compound characters, 10 numerals and 10 modifiers. 
They obtained 97.73% accuracy in the mixed set. Alif et al. [18] experimented Resnet architecture to 
recognize 173 characters. Dropout between layers and Adam optimizer was used with Resnet. They acquired 
95.18% precision using this Resnet model. Alom et al. [19] evaluated different state-of-the-art DCNN model 
for BHCR. They used CMATERdb [19] for testing purpose and got the best accuracy using DenseNet [20]. 


3. MOBILENET V1 ARCHITECTURE 

MobileNet is a light weight model with 4,253,864 parameters. It takes 224x224x3 images as input 
and uses depthwise separable convolution block rather than standard convolutional block to extract 
feature [21]. Depthwise convolution uses 1 filter to each channel during convolution. Pointwise convolution 
is 1x1 convolution. Pointwise convolution merges the output of depthwise convolution. Depthwise seperable 
convolution contains a smaller number of parameters than traditional standard convolution. Depthwise 
convolution is used to extract features from images and pointwise convolution is used to combine 
the features. These layers also have a ReLU activation layer and Batch Normalization layer. But in standard 
convolution, filtering the image and combining output is done by one layer [22] which is computationally 
more expensive.So, depthwise separable convolution is more efficient than standard convolution layer. 
Figure 2 shows the difference between depthwise separable convolution and standard convolution. 
MobileNet architecture is provided in Figure 3. 
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Figure 2. (a) Depthwise convolution [21], 


(b) Standard Convolution [22] Figure 3. MobileNet architecture [21] 


4. RESEARCH METHOD 

The whole work is conducted in the manner mentioned in Figure 4. At first a large Bangla handwritten 
dataset has been collected. The dataset required cleaning and then it was fed to MobileNet model for training. 
Around 20% images were kept aside for testing purpose. The model was then tested with testing dataset. 
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Figure 4. Workflow followed in this research 


4.1. Dataset preparation 

There are four large databases containing Bangla handwritten cahracters [19, 23-25]. Among them, 
CMATERdb [11] has the largest amount of classes. We used CMATERdb [19] 3.1.1, 3.1.2 and 3.1.3 
databases for acquiring images of 231 classes. The images were mostly in grey background and size 
of the images were different. The dataset was divided into train & test. At first, we cleaned the dataset in 
order to remove misclassed images. Then, we converted the images to binary and later we converted 
the images to a same shape (28*28). Figure 5(a) shows some of the samples of original dataset. Figure 5(b) 
shows the result of preprocessing. Table | provides a statistics of training, testing and validation images used 
for evaluating this model. 


Ty R 


Figure 5. (a) Before preprocessing, (b) After preprocessing 





Table 1. Dataset splitting 


Dataset No of classes Training images Validate images Test images 
Basic + Compound + Numeral 231 38,807 9,612 11,503 
Compound 171 27,486 6,793 8,123 
Basic 50 9,411 2,345 2,896 
Numeral 10 1,910 474 484 


4.2. Training and testing 

We trained the MobileNet architecture four times for mixed set, compound set, basic set & numeral 
set. Each time we set neurons number of the last dense layer according to the number of classes. We trained 
the model with Adam optimizer, Learning rate=0.001, loss=categorical crossentropy and epoch=80. 
This model converged after 60 epochs which can be visualized from Figure 6. After training, we tested 
the model with completely isolated test set. Table 2 describes the time required to complete an epoch using 
MobileNet. Figure 6 shows the validation & training accuracy with respect to number of epochs. 


Table 2. Time required to train 


Classification task Seconds per epoch 
Numerals (10 classes) 15 
Basic (50 classes) 70 
Compound (171 classes) 212 
Mixed (231 classes) 276 
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Figure 6. Training and testing accuracy vs epoch 


5. RESULT AND DISCUSSION 

The trained MobileNet model is tested with completely isolated dataset. During 231 classes 
classification, this model acquired 96.56% in validation set and 96.46% in testing set. This model obtained 
96.26% precision in validation set and 96.17% precision in test set during 171 compound character 
classification. It provided 98.47% accuracy in validation set and 98.36% accuracy in testing set in basic 
character recognition. For numeral classification, it acquired 99.80% and 99.56% in validation and testing set 
respectively. Table 3 illustrates the precision acquired in different classification task. 


Table 3. Accuracy of proposed model in different classification task 
Classification task | Number of classes Validation accuracy Testing accuracy 


Numeral 10 99.80% 99.56% 
Basic 50 98.47% 98.37% 
Compound 171 96.26% 96.17% 
Mixed 231 96.56% 96.46% 


From the result, we can visualize that the proposed model was pretty good in classifying 
the numerals but it became more confusing and provided most errors in the case of compound characters 
as their structure is more complex, similar to each other than numeral characters and some of them can be 
written in different structure. We can compare the proposed model with existing model according 
to the number of classes classified and accuracy. Table 4 shows comparison between existing models 
and proposed model. 


Table 4. Comparison between existing models and proposed model 


Models Number of classes Accuracy 
Nibaran Das et al. (2010) [2] 93 80.86% 
Rahul Pramanik et al. (2018) [4] 171 88.74% 
Saikot Roy et al. (2017) [13] 173 90.33% 
A. Ashiquzzaman et al. (2017) [14] 171 93.68% 
A. Fardous et al. (2019) [15] 171 95.5% 
M. A. R. Alif et al. (2017)[18] 173 95.18% 
Rabby et al. (2018) [17] 122 97.73% 
Sourajit Saha et al. (2018) [16] 84 97.21% 
Proposed Model 231 96.46% 


Rabby et al. [17] acquired around 97.73% accuracy in classifying 122 classes and Saha et al. [16] 
acquired 97.21% accuracy in classifying 84 classes. Both of them achieved better accuracy than MobiletNet 
model but they classified a smaller number of classes than this model. So, in case of number of classes this 
model beats all the existing models. Figure 7 shows a comparison between existing models and our proposed 
model according to accuracy and number of classes. Among these models, five of them used CMATERdb as 
database. Among them, MobileNet model provided the best accuracy and was able to classify the largest 
number of classes. A comparison between existing models which used CMATERdb and MobileNet model 
is provided in Figure 8. 
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Figure 8. Comparison of different models which are evaluated using CMATERdb 


6. CONCLUSION AND FUTURE SCOPE 

Bangla handwritten character recognition is much necessitated in order to establish an efficient 
Bangla OCR. A better model capable of classifying more than mostly used 400 Bangla handwritten 
characters with higher accuracy is to be found out for establishing a good OCR system for Bangla Characters. 
MobileNet architecture can be a solution to BHCR because of its lightweight and fast computation. In this 
regard, other state of art CNN architectures like InceptionNet or ResNet50 can also be applied, however, they 
must be both efficient and lightweight as well. In this paper, we have tried to illustrate the usefulness of 
lightweight models and we have also demonstrated that a low-cost model like MobileNet can be a great 
solution to accomplish recognition task for a very cursive and large character set.Yet an unsupervised 
technique can be considered where the handwritten characters will be translated to a digitally readable form 
and the system can easily recognize the handwritten characters from this newly translated form. This can be 
an advantageous approach to save the heritage of Bangla literature which will be capable of establishing 
an efficient and computationally inexpensive system to convert Bangla handwritten document to a formatted 
printed document automatically. 
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